{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "outputs": [],
   "source": [
    "import torch\n",
    "from torch import nn\n",
    "from IPython.core.interactiveshell import InteractiveShell\n",
    "InteractiveShell.ast_node_interactivity='all'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# 图像卷积"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## 互相关运算"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- 严格来说，卷积层是个错误的叫法，因为它所表达的运算其实是**互相关运算**（cross-correlation），而不是卷积运算"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- 在卷积层中，输入张量和核张量通过互相关运算产生输出张量"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- 暂时忽略通道（第三维）这一情况，在下图中，输入是高度为3、宽度为3的二维张量（即形状为3×\n",
    "3\n",
    "）。卷积核的高度和宽度都是\n",
    "2\n",
    "，而卷积核窗口（或卷积窗口）的形状由内核的高度和宽度决定（即\n",
    "2\n",
    "×\n",
    "2\n",
    "）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "<center><img src=\"../img/6_convolutional_neural_networks/correlation.svg\" width=100%></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- 上图中，输出张量的四个元素由二维互相关运算得到，输出高度为$2$、宽度为$2$，如下所示：\n",
    "\n",
    "$$\n",
    "0\\times0+1\\times1+3\\times2+4\\times3=19,\\\\\n",
    "1\\times0+2\\times1+4\\times2+5\\times3=25,\\\\\n",
    "3\\times0+4\\times1+6\\times2+7\\times3=37,\\\\\n",
    "4\\times0+5\\times1+7\\times2+8\\times3=43.\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- 输出大小等于输入大小$n_h \\times n_w$减去卷积核大小$k_h \\times k_w$，即：\n",
    "\n",
    "$$(n_h-k_h+1) \\times (n_w-k_w+1).$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-31T02:29:06.202461Z",
     "iopub.status.busy": "2022-07-31T02:29:06.202245Z",
     "iopub.status.idle": "2022-07-31T02:29:06.207535Z",
     "shell.execute_reply": "2022-07-31T02:29:06.206895Z"
    },
    "origin_pos": 3,
    "slideshow": {
     "slide_type": "slide"
    },
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [],
   "source": [
    "def corr2d(X, K):\n",
    "    \"\"\"计算二维互相关运算\n",
    "        X是输入张量\n",
    "        K是卷积核张量\n",
    "    \"\"\"\n",
    "    h, w = K.shape # 核的形状\n",
    "    Y = torch.zeros((X.shape[0] - h + 1, X.shape[1] - w + 1)) # 输出的形状\n",
    "    for i in range(Y.shape[0]):\n",
    "        for j in range(Y.shape[1]):\n",
    "            Y[i, j] = (X[i:i + h, j:j + w] * K).sum() # 卷积\n",
    "    return Y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-31T02:29:06.210182Z",
     "iopub.status.busy": "2022-07-31T02:29:06.209977Z",
     "iopub.status.idle": "2022-07-31T02:29:06.218527Z",
     "shell.execute_reply": "2022-07-31T02:29:06.217894Z"
    },
    "origin_pos": 6,
    "slideshow": {
     "slide_type": "fragment"
    },
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[19., 25.],\n",
       "        [37., 43.]])"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 验证上述二维互相关运算的输出\n",
    "X = torch.tensor([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]])\n",
    "K = torch.tensor([[0.0, 1.0], [2.0, 3.0]])\n",
    "corr2d(X, K)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## 实现二维卷积层"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- 卷积层中的两个被训练的参数是卷积核权重和偏置"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- 在训练基于卷积层的模型时，也随机初始化卷积核权重"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-31T02:29:06.221680Z",
     "iopub.status.busy": "2022-07-31T02:29:06.221238Z",
     "iopub.status.idle": "2022-07-31T02:29:06.225811Z",
     "shell.execute_reply": "2022-07-31T02:29:06.225186Z"
    },
    "origin_pos": 9,
    "slideshow": {
     "slide_type": "fragment"
    },
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [],
   "source": [
    "# 实现二维卷积层\n",
    "# 在`__init__`构造函数中，将`weight`和`bias`声明为两个模型参数\n",
    "# 前向传播函数调用`corr2d`函数并添加偏置\n",
    "\n",
    "class Conv2D(nn.Module):\n",
    "    def __init__(self, kernel_size):\n",
    "        super().__init__()\n",
    "        self.weight = nn.Parameter(torch.rand(kernel_size))\n",
    "        self.bias = nn.Parameter(torch.zeros(1))\n",
    "\n",
    "    def forward(self, x):\n",
    "        return corr2d(x, self.weight) + self.bias"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- 高度和宽度分别为$h$和$w$的卷积核可以被称为$h \\times w$卷积或$h \\times w$卷积核\n",
    "- 也将带有$h \\times w$卷积核的卷积层称为$h \\times w$卷积层"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## 卷积层的一个简单应用：检测图像中不同颜色的边缘"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- 通过找到像素变化的位置，来检测图像中不同颜色的边缘"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "### 生成X和y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- 构造一个$6\\times 8$像素的黑白图像。中间四列为黑色（$0$），其余像素为白色（$1$）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-31T02:29:06.228717Z",
     "iopub.status.busy": "2022-07-31T02:29:06.228295Z",
     "iopub.status.idle": "2022-07-31T02:29:06.233997Z",
     "shell.execute_reply": "2022-07-31T02:29:06.233381Z"
    },
    "origin_pos": 12,
    "slideshow": {
     "slide_type": "fragment"
    },
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[1., 1., 0., 0., 0., 0., 1., 1.],\n",
       "        [1., 1., 0., 0., 0., 0., 1., 1.],\n",
       "        [1., 1., 0., 0., 0., 0., 1., 1.],\n",
       "        [1., 1., 0., 0., 0., 0., 1., 1.],\n",
       "        [1., 1., 0., 0., 0., 0., 1., 1.],\n",
       "        [1., 1., 0., 0., 0., 0., 1., 1.]])"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X = torch.ones((6, 8))\n",
    "X[:, 2:6] = 0\n",
    "X"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- 构造一个高度为$1$、宽度为$2$的卷积核`K`\n",
    "- 当进行互相关运算时，如果水平相邻的两元素相同，则输出为零，否则输出为非零"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-31T02:29:06.236985Z",
     "iopub.status.busy": "2022-07-31T02:29:06.236563Z",
     "iopub.status.idle": "2022-07-31T02:29:06.240094Z",
     "shell.execute_reply": "2022-07-31T02:29:06.239462Z"
    },
    "origin_pos": 15,
    "slideshow": {
     "slide_type": "fragment"
    },
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [],
   "source": [
    "K = torch.tensor([[1.0, -1.0]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "- 对参数`X`（输入）和`K`（卷积核）执行互相关运算。\n",
    "\n",
    "- 输出`Y`中的1代表从白色到黑色的边缘，-1代表从黑色到白色的边缘\n",
    "\n",
    "- 其他情况的输出为$0$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-31T02:29:06.243087Z",
     "iopub.status.busy": "2022-07-31T02:29:06.242577Z",
     "iopub.status.idle": "2022-07-31T02:29:06.248745Z",
     "shell.execute_reply": "2022-07-31T02:29:06.248123Z"
    },
    "origin_pos": 17,
    "slideshow": {
     "slide_type": "fragment"
    },
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[ 0.,  1.,  0.,  0.,  0., -1.,  0.],\n",
       "        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],\n",
       "        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],\n",
       "        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],\n",
       "        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],\n",
       "        [ 0.,  1.,  0.,  0.,  0., -1.,  0.]])"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Y = corr2d(X, K)\n",
    "Y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-31T02:29:06.251821Z",
     "iopub.status.busy": "2022-07-31T02:29:06.251251Z",
     "iopub.status.idle": "2022-07-31T02:29:06.257279Z",
     "shell.execute_reply": "2022-07-31T02:29:06.256665Z"
    },
    "origin_pos": 19,
    "slideshow": {
     "slide_type": "slide"
    },
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0.]])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 将输入的二维图像转置，再进行如上的互相关运算\n",
    "\n",
    "corr2d(X.T, K)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- 卷积核`K`只可以检测垂直边缘，无法检测水平边缘"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### 学习由`X`生成`Y`的卷积核"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- 当有了更复杂数值的卷积核，或者连续的卷积层时，是否可以学习由`X`生成`Y`的卷积核？"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- 先构造一个卷积层，并将其卷积核初始化为随机张量\n",
    "- 在每次迭代中，比较`Y`与卷积层输出的平方误差，然后计算梯度来更新卷积核"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-31T02:29:06.260238Z",
     "iopub.status.busy": "2022-07-31T02:29:06.259812Z",
     "iopub.status.idle": "2022-07-31T02:29:06.274613Z",
     "shell.execute_reply": "2022-07-31T02:29:06.273942Z"
    },
    "origin_pos": 22,
    "slideshow": {
     "slide_type": "slide"
    },
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [],
   "source": [
    "# 构造一个二维卷积层，它具有1个输出通道和形状为（1，2）的卷积核\n",
    "\n",
    "conv2d = nn.Conv2d(1,1, kernel_size=(1, 2), bias=False) # 输入和输出通道都为1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "```python\n",
    "torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)\n",
    "```\n",
    "\n",
    "- `in_channels`：`int`类型，输入通道数量\n",
    "- `kernel_size`：`tuple`类型，卷积核形状"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "# 这个二维卷积层使用四维输入和输出格式（批量大小、通道、高度、宽度），\n",
    "# 其中批量大小和通道数都为1\n",
    "\n",
    "X = X.reshape((1, 1, 6, 8))\n",
    "Y = Y.reshape((1, 1, 6, 7))\n",
    "lr = 3e-2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "epoch 2, loss 11.223\n",
      "epoch 4, loss 2.605\n",
      "epoch 6, loss 0.733\n",
      "epoch 8, loss 0.244\n",
      "epoch 10, loss 0.091\n"
     ]
    }
   ],
   "source": [
    "for i in range(10):\n",
    "    Y_hat = conv2d(X)\n",
    "    # 计算平方误差\n",
    "    l = (Y_hat - Y) ** 2\n",
    "    conv2d.zero_grad()\n",
    "    # 反向传播\n",
    "    l.sum().backward()\n",
    "    # 迭代卷积核\n",
    "    conv2d.weight.data[:] -= lr * conv2d.weight.grad\n",
    "    if (i + 1) % 2 == 0:\n",
    "        print(f'epoch {i+1}, loss {l.sum():.3f}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-31T02:29:06.277496Z",
     "iopub.status.busy": "2022-07-31T02:29:06.277286Z",
     "iopub.status.idle": "2022-07-31T02:29:06.282787Z",
     "shell.execute_reply": "2022-07-31T02:29:06.282165Z"
    },
    "origin_pos": 26,
    "slideshow": {
     "slide_type": "fragment"
    },
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[ 1.0150, -0.9551]])"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 所学的卷积核的权重张量\n",
    "conv2d.weight.data.reshape(1,2)"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "幻灯片",
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  },
  "latex_envs": {
   "LaTeX_envs_menu_present": true,
   "autoclose": true,
   "autocomplete": true,
   "bibliofile": "biblio.bib",
   "cite_by": "apalike",
   "current_citInitial": 1,
   "eqLabelWithNumbers": true,
   "eqNumInitial": 1,
   "hotkeys": {
    "equation": "Ctrl-E",
    "itemize": "Ctrl-I"
   },
   "labels_anchors": false,
   "latex_user_defs": false,
   "report_style_numbering": false,
   "user_envs_cfg": false
  },
  "rise": {
   "autolaunch": false,
   "enable_chalkboard": true,
   "scroll": true
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": true
  },
  "vscode": {
   "interpreter": {
    "hash": "34418ffb6f02e522390adb0e13441cc75f901cd11cccb4f6f613643b4b4d2a0b"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
